How to Quantitatively Compare Data Dissimilarities for Unsupervised Machine Learning?
نویسندگان
چکیده
For complex data sets, the pairwise similarity or dissimilarity of data often serves as the interface of the application scenario to the machine learning tool. Hence, the final result of training is severely influenced by the choice of the dissimilarity measure. While dissimilarity measures for supervised settings can eventually be compared by the classification error, the situation is less clear in unsupervised domains where a clear objective is lacking. The question occurs, how to compare dissimilarity measures and their influence on the final result in such cases. In this contribution, we propose to use a recent quantitative measure introduced in the context of unsupervised dimensionality reduction, to compare whether and on which scale dissimilarities coincide for an unsupervised learning task. Essentially, the measure evaluates in how far neighborhood relations are preserved if evaluated based on rankings, this way achieving a robustness of the measure against scaling of data. Apart from a global comparison, local versions allow to highlight regions of the data where two dissimilarity measures induce the same results.
منابع مشابه
Prototype-Based Classification of Dissimilarity Data
Unlike many black-box algorithms in machine learning, prototype based models offer an intuitive interface to given data sets since prototypes can directly be inspected by experts in the field. Most techniques rely on Euclidean vectors such that their suitability for complex scenarios is limited. Recently, several unsupervised approaches have successfully been extended to general possibly non-Eu...
متن کاملUsing Machine Learning for Exploratory Data Analysis
This tutorial will introduce attendees to fundamental concepts in the clustering and dimensionality reduction fields of unsupervised machine learning. Attendees will learn about the assumptions algorithms make and how those assumptions can cause the algorithms to be more or less suited to particular datasets. Hands-on interaction with machine learning algorithms on real and synthetic data are a...
متن کاملUnsupervised Feature Learning for low-level Local Image Descriptors
Unsupervised feature learning has shown impressive results for a wide range of input modalities, in particular for object classification tasks in computer vision. Using a large amount of unlabeled data, unsupervised feature learning methods are utilized to construct high-level representations that are discriminative enough for subsequently trained supervised classification algorithms. However, ...
متن کاملPresentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملPresentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کامل